Introduction

GitHub Repository: https://github.com/mtdunphy-umd/SURV727-Final-Project

Since 2010 and the controversial decision of Citizens United vs. Federal Election Commission (FEC), the amount of money spent on political campaigns has risen exponentially. In 2010, a midterm election where infamously Republicans picked up a half dozen seats in the Senate and more than 60 seats in the House, political expenditures totaled around 3.6 billion dollars. In 2022, that number is projected to be around $8.9B.

Candidates with more money win more often, so fundraising has become a critical indicator of election forecasting and understanding the current political climate.

Major partisan events that dominate the greater political discussion often cause a mass influx of money to political campaigns. With each fundraising cycle there are candidates that perform above and below expectations based on candidate quality and the overall political climate. Our goal is to identify those trends in contributions that correlate with significant political shocks during the 2022 midterm cycle, as well as identify characteristics of candidates that affected fundraising performance. In our analysis, we build 3 linear models to identify which characteristics were significant in fundraising among house candidates, senate candidates, and candidates running in competitive districts. We also build 3 logistic models predicting change of win in the general election among the same groups - controlling for the characteristics of their districts. We conduct principal component analysis to identify which variables are the most predictive for our model. We also utilize data visualizations such as comparing top fundraisers by party and office. We also plot fundraising numbers over time through the 2022 election year by day and week.

Data

Data collected for this analysis came from multiple sources. To start, we collected data from 538’s “2022 Primary Project” GitHub repository which tracked the 2022 midterm primary candidates at the federal level. We processed this data to extract general election candidates by filtering candidates based on whether or not they won their primary election (or if there was one, their runoff election). 900 candidates were pulled with information on endorsements, incumbency status, race, and gender of the candidates.

Contributions to candidates were provided by the FEC which tracks federal level campaign finance information. We tried using the FEC API, relying on the R.openFEC package in R, but were unsuccessful in processing the data in a reasonable time, and thus shifted methods to pulling data straight from the FEC website. This data included ‘Candidate-committee linkages’ and ‘Contributions by individuals’ on the bulk data webpage. The 538 general election candidate data set was joined with the candidate-committee linkages FEC data set by office, state, district, and candidate last name to get FEC candidate IDs and committee IDs that will be used later on into the analysis.

Individual contribution bulk data was brought into R and found to have discrepancies and errors. Due to this, we manually pulled individual state receipts for the year of 2022 using custom filters and unioned together in R. Receipt data is up to the last FEC report date which was 10/19/2022. This data was joined with our candidate data set on candidate ID and committee ID to view total fundraising numbers for each candidate as well as the timeline of contributions.

Google trends data was provided by Google, using the gtrendsR R package to identify key issues that were of high salience in this election cycle. We wanted to focus on the issue of abortion and Trump’s influence which is why we searched the following terms: ‘abortion’, ‘supreme court’, ‘Trump’, ‘FBI’, and ‘crime’. Crime was included as a comparison as it was another major issue for many voters this cycle.

District level data was provided by Dave’s Redistricting App (DRA) for each state. This data was manually collected for each state and inputted into this google sheet. The sources of data and processing done to calculate these values for districts can be viewed on DRA’s website. Data that was used included the percentage of white voters in 2020, the 2020 presidential vote share for each party, and the composite score for each party for each district. Composite scores are the mean share of the votes for presidential, senate, governor, and attorney general races from 2016 to 2020 for each party (2016 to 2021 for districts in New Jersey and Virginia). Only Utah’s districts did not have composite score data.

Election result data was manually collected for the New York Times Midterm Tracker for house and senate candidates and inputted into this google sheet. During this process, we identified discrepancies between our candidate data and the information shown on NYT (a deceased house incumbent and districts that had no opponent). We used this information to filter districts in our final data set used for analysis.

# load 538 primary candidates
dem <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/primary-project-2022/dem_candidates.csv")
rep <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/primary-project-2022/rep_candidates.csv")

# District data provided by Dave's Redistricting App https://davesredistricting.org/maps#home
# Each state's congressional district data was manually exported and combined into a csv
dra_house <- read.csv(paste(wd, "/input data/", "DRA Data - House Districts .csv", sep=''))
dra_senate <- read.csv(paste(wd, "/input data/", "DRA Data - Senate Districts.csv", sep=''))

# Collecting data from the FEC
# candidate data for the 2021-2022 cycle pulled from here: https://www.fec.gov/data/browse-data/?tab=bulk-data
# description of file: https://www.fec.gov/campaign-finance-data/all-candidates-file-description/

header <- "CAND_ID|CAND_NAME|CAND_ICI|PTY_CD|CAND_PTY_AFFILIATION|TTL_RECEIPTS|TRANS_FROM_AUTH|TTL_DISB|TRANS_TO_AUTH|COH_BOP|COH_COP|CAND_CONTRIB|CAND_LOANS|OTHER_LOANS|CAND_LOAN_REPAY|OTHER_LOAN_REPAY|DEBTS_OWED_BY|TTL_INDIV_CONTRIB|CAND_OFFICE_ST|CAND_OFFICE_DISTRICT|SPEC_ELECTION|PRIM_ELECTION|RUN_ELECTION|GEN_ELECTION|GEN_ELECTION_PRECENT|OTHER_POL_CMTE_CONTRIB|POL_PTY_CONTRIB|CVG_END_DT|INDIV_REFUNDS|CMTE_REFUNDS"

base <- toString(read_file(paste(wd, "/input data/", "weball22.txt", sep=''))[1])

init <- file(paste(wd, "/output data/", "weball22_header.txt", sep=''))
writeLines(paste(append(header, base), sep = "|"), init)
close(init)

fec_candidate_info <- read.table(paste(wd, "/output data/", "weball22_header.txt", sep=''), sep= "|", header=TRUE)

# fec data has some misaligned columns using the read.table function. Data was imported to google sheets and manually modified to get columns in the correct order.
# google sheet: https://docs.google.com/spreadsheets/d/150dhkj1xrFwfi43ouYqu0LFLcj4jRMetXSRucYLTDIk/edit?usp=sharing

fec_candidate_info_fixed <- read.csv(paste(wd, "/input data/", "FEC Candidate Data - Fixed.csv", sep=''))
fec_candidate_info_fixed$FEC_index <- row.names(fec_candidate_info_fixed)

# contributions by individuals downloaded in bulk from here: https://www.fec.gov/data/browse-data/?tab=bulk-data
# description of file: https://www.fec.gov/campaign-finance-data/contributions-individuals-file-description/
# the file was too large to be imported into the github repository
# the file can be found in this google folder: https://drive.google.com/drive/folders/172eM7HDJ1CMVMPgpaN74dY3bzY9piSsz?usp=sharing
# the folder can be added to the input folder in the repository to reproduce the results
# added header info to the top of the intcont.txt file
fec_receipts <- read.table(paste(wd, "/input data/indiv22/", "itcont.txt", sep=''), sep= "|", header=TRUE, fill=TRUE)
write.csv(fec_receipts, paste(wd, "/output data/", "fec_receipts.csv", sep=''))

# the bulk data set showed discrepancies and errors so the data was manually pulled by individual states and joined together.
# the files were too large to be imported into the github repository
# the file can be found in this google folder: https://drive.google.com/drive/folders/10LzNSv9ucwlUsFGJbJP6nx2qP5cVkLFJ?usp=sharing
# the folder can be added to the input folder in the repository to reproduce the results
folder <- paste(wd, "/input data/FEC Receipts", sep='')
csv_files <- list.files(folder, pattern = "*.csv")

df_list <- list()
for (file in csv_files) {
  df <- read_csv(file.path(folder, file), show_col_types = FALSE) %>%
          mutate_all(as.character)
  df_list[[file]] <- df
}
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## • `committee_name` -> `committee_name...2`
## • `committee_name` -> `committee_name...9`
fec_receipts_combined <- dplyr::bind_rows(df_list)
write.csv(fec_receipts_combined, paste(wd, "/output data/", "fec_receipts_combined.csv", sep=''))

# to link candidates to their committees, FEC provides a bulk data set that can be found here under 'Candidate-committee linkages': https://www.fec.gov/data/browse-data/?tab=bulk-data
# a description of the file can be found here: https://www.fec.gov/campaign-finance-data/candidate-committee-linkage-file-description/

# the ccl.txt file was modified to include the header: ccl_header_file.csv
fec_candidate_committees <- read.table(paste(wd, "/input data/", "ccl.txt", sep=''), sep= "|", header=TRUE, fill=TRUE)

# select candidate id, committee id, linkage id
fec_candidate_committees_trimmed <- fec_candidate_committees %>%
                                      select(CAND_ID, CMTE_ID, LINKAGE_ID)

write.csv(fec_candidate_committees_trimmed, paste(wd, "/output data/", "fec_candidate_committees_trimmed.csv", sep=''))

# Election results were collected manually off of the NYT Election tracker: https://www.nytimes.com/interactive/2022/11/08/us/elections/results-senate.html?action=click&pgtype=Article&state=default&module=election-results&context=election_recirc&region=NavBar
# Data was collected and inputed into this spreadsheet: https://docs.google.com/spreadsheets/d/1azMpRjQ9sRgW_ULf6qvpouKnwAHWRi_eY1U-RnLPKjI/edit?usp=sharing

nyt_election_results <- read.csv(paste(wd, "/input data/", "NYT Election Tracker Data - General Election Candidates.csv", sep='')) %>%
                          select(State, Office, District, Candidate, Winner.)

All data processing can be viewed in R markdown file.

Results

This section presents the main results.

Data exploration

In this section, we conduct exploratory data analysis to better understand the data. This includes visualizing keyword trends using google trends, plotting fundraising amounts overtime, comparing top fundraising candidates for each party, and compare origin and destination contribution amounts by state.

# take filtered data and join it with receipt data to view contributions over time
final_receipt_date <- left_join(filtered_data, fec_receipts_combined.trimmed, by = c("CMTE_ID" = "committee_id")) %>%
                        select(Candidate, State, State_processed, District, Party, `2020.White.%`, Pres_2020, Comp, `Dem-Rep.2020`, `Dem-Rep.Comp`, Gender.Num, White.Num, Black.Num, Asian.Num, Latino.Num, Middle_Eastern.Num, Native_American.Num, Incumbent.Num, Trump.Num, Party.Committee.Num, Emily.s.List.Num, Maggie.s.List.Num, Sanders.Num, Renew.America.Num, Winner.Num, Senator.Num, CAND_ID, CMTE_ID, transaction_id, contributor_state, contributor_zip, contributor_id, contribution_receipt_date, contribution_receipt_amount, contributor_aggregate_ytd)

final_receipt_date$contribution_receipt_amount[is.na(final_receipt_date$contribution_receipt_amount)] <- 0
head(final_receipt_date)
write.csv(final_receipt_date, paste(wd, "/output data/", "final_receipt_date.csv", sep=''))

# filter data for representatives and senators
final_house <- final %>%
                filter(Senator.Num == 0) %>%
                select(-Senator.Num)
head(final_house)
final_senate  <- final %>%
                  filter(Senator.Num == 1) %>%
                  select(-District, -Senator.Num)
head(final_senate)
# filter data for competitive districts
# competitive districts defined as districts with + or - 10 ppt difference in 2020 presidential vote share (DEM - GOP)
final_competitive <- final %>%
                      filter(abs(`Dem-Rep.2020`) <= .1) %>%
                      select(-District)
head(final_competitive)

Fig. 1.

require(gtrendsR)
Gtrenddata2022 <- gtrends(c("Abortion", "Trump", "Supreme Court", "Crime", "FBI"), 
               geo = "US", time = "2022-01-01 2022-12-01",onlyInterest = TRUE)

plot(Gtrenddata2022)

Fig. 1 tracks the number of Google searches related to the terms ‘Abortion’, ‘Crime’, ‘FBI’, ‘Supreme Court’, and ‘Trump’ over the last year. These terms generally reflect the issues that were salient during the 2022 election. Abortion and Trump were most prevalent in the public’s mind reflected by the most US Google search hits. Both Abortion and Supreme Court spike around the time of the Dobb’s decision that overturned Roe v. Wade. The spike in both FBI and Trump coincides with the Mar-a-Lago raid in early September. Crime was another major issue with voters in this election, yet we don’t see large changes in the amount people searched on Google for crime related topics. We also tested ‘economy’, and it had a similar trend as ‘crime’, but with less hits.

Fig. 2.

# cumulative sum of contributions
final_receipt_date_party_cumulative <- final_receipt_date %>%
                                  group_by(Party, contribution_receipt_date) %>%
                                  summarise(fundraised = sum(as.double(contribution_receipt_amount))) %>%
                                  mutate(cumulative_fundraised = cumsum(fundraised))
## `summarise()` has grouped output by 'Party'. You can override using the
## `.groups` argument.
ggplot(final_receipt_date_party_cumulative, aes(x = as.Date(contribution_receipt_date)), y = cumulative_fundraised, color = Party) +
  geom_line(aes(y = cumulative_fundraised, color = Party)) +
  scale_x_date(limits = as.Date(c("2022-01-01", "2022-10-19"))) +
  labs(title = "Fig. 2 Cummulative Sum of Individual Contributions for Federal Candidates of \nEach Party", 
       x = "Day of Contribution", 
       y = "Contribution Amount ($)") +
  scale_color_manual(values = c("GOP" = "red", "DEM" = "blue")) +
  geom_vline(xintercept=c(as.Date("2022-04-01"), as.Date("2022-07-01"), as.Date("2022-10-01")), linetype = 2, color = "black", size = 0.1) +
  geom_text(aes(as.Date("2022-04-01"), 0, label = "End Q1"), size= 3) +
  geom_text(aes(as.Date("2022-07-01"), 0, label = "End Q2"), size= 3) +
  geom_text(aes(as.Date("2022-10-01"), 0, label = "End Q3"), size= 3) +
  theme_classic() +
  theme(panel.grid.major.y = element_line(color = "grey", size = 0.1))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## Warning: Removed 24 rows containing missing values (`geom_line()`).

Fig. 2 is the cumulative sum of all contributions from 1/1/22 - 10/19/22. You can see both parties spiked at the end of quarter deadlines - at the end of a disclosure period, each campaign has to publicly release their fundraising report for that quarter. Candidates tend to push to bring in as much money as they can so when their records are released they can show fundraising momentum. While both parties spike at EOQ deadlines, the Democrats clearly outperformed the Republicans after Q1. This could be in part due to the fact the Democrats rely more on individual contributions than Republicans - who tend to make up the shortfall in candidate fundraising by large spending from outside groups (https://www.washingtonpost.com/politics/2022/10/07/house-democrats-fundraising/).

Fig. 3.

# by day
final_receipt_date_party_day <- final_receipt_date %>%
                                  group_by(Party, contribution_receipt_date) %>%
                                  summarise(fundraised = sum(as.double(contribution_receipt_amount)))
## `summarise()` has grouped output by 'Party'. You can override using the
## `.groups` argument.
ggplot(final_receipt_date_party_day, aes(x = as.Date(contribution_receipt_date)), y = fundraised, color = Party) +
  geom_line(aes(y = fundraised, color = Party)) +
  scale_x_date(limits = as.Date(c("2022-01-01", "2022-10-19"))) +
  labs(title = "Fig. 3 Sum of Individual Contributions by Day for Federal Candidates \nof Each Party", 
       x = "Day of Contribution", 
       y = "Contribution Amount ($)") +
  scale_color_manual(values = c("GOP" = "red", "DEM" = "blue")) +
  geom_vline(xintercept=c(as.Date("2022-05-02"), as.Date("2022-06-24"), as.Date("2022-08-08")), linetype = 2, color = "black", size = 0.1) +
  geom_text(aes(as.Date("2022-05-02"), 12000000, label = "Dobbs Leak"), size= 3) +
  geom_text(aes(as.Date("2022-06-24"), 10000000, label = "Dobbs Decision"), size= 3) +
  geom_text(aes(as.Date("2022-08-08"), 8000000, label = "Trump FBI Raid"), size= 3) +
  theme_classic() +
  theme(panel.grid.major.y = element_line(color = "grey", size = 0.1))
## Warning: Removed 24 rows containing missing values (`geom_line()`).

Fig. 3 shows the contributions each party brought in by day between 1/1/22 - 10/19/22. The largest spikes correspond with the EOQ deadlines as mentioned in Fig. 2. Also included are the dates of major political shocks - notably the Dobbs decision happened near the end of Q2. The Democrats led that spike in fundraising and then continued the daily lead throughout the next quarter. Also, GOP fundraising did not catch up after the FBI raid on Mar-a-Lago - which became a large part of the fundraising message from the GOP around that time.

Fig. 4.

# by week
final_receipt_date_party_week <- final_receipt_date %>%
                                  mutate(week = week(as.Date(contribution_receipt_date))) %>%
                                  group_by(Party, week) %>%
                                  summarise(fundraised = sum(as.double(contribution_receipt_amount)))
## `summarise()` has grouped output by 'Party'. You can override using the
## `.groups` argument.
ggplot(final_receipt_date_party_week, aes(x = as.Date(paste(2022, week, 1, sep="-"), "%Y-%U-%u")), y = fundraised, color = Party) +
  geom_line(aes(y = fundraised, color = Party)) +
  scale_x_date(limits = as.Date(c("2022-01-01", "2022-10-19"))) +
  labs(title = "Fig. 4 Sum of Individual Contributions by Week for Federal Candidates \nof Each Party", 
       x = "Week of Contribution", 
       y = "Contribution Amount ($)") +
  scale_color_manual(values = c("GOP" = "red", "DEM" = "blue")) +
  geom_vline(xintercept=c(as.Date("2022-05-02"), as.Date("2022-06-24"), as.Date("2022-08-08")), linetype = 2, color = "black", size = 0.1) +
  geom_text(aes(as.Date("2022-05-02"), 30000000, label = "Dobbs Leak"), size= 3) +
  geom_text(aes(as.Date("2022-06-24"), 28000000, label = "Dobbs Decision"), size= 3) +
  geom_text(aes(as.Date("2022-08-08"), 26000000, label = "Trump FBI Raid"), size= 3) +
  theme_classic() +
  theme(panel.grid.major.y = element_line(color = "grey", size = 0.1))
## Warning: Removed 10 rows containing missing values (`geom_line()`).

Fig. 4 shows the weekly contributions to each party between 1/1/22 - 10/19/22. Democrats clearly held a fundraising advantage after the Dobbs leak for the rest of the cycle. While most spikes correlate between parties (when one party has a good week, so does the other), there are a few key moments shown where Democrats achieved smaller spikes without seeing a similar trend among Republican candidate fundraising.

Fig. 5.

# look at only competitive districts
# competitive districts defined as districts with + or - 10 ppt difference in 2020 presidential vote share (DEM - GOP)
# by day
final_receipt_date_party_day_competitive <- final_receipt_date %>%
                                              filter(abs(`Dem-Rep.2020`) <= .1) %>%
                                              group_by(Party, contribution_receipt_date) %>%
                                              summarise(fundraised = sum(as.double(contribution_receipt_amount)))
## `summarise()` has grouped output by 'Party'. You can override using the
## `.groups` argument.
ggplot(final_receipt_date_party_day_competitive, aes(x = as.Date(contribution_receipt_date)), y = fundraised, color = Party) +
  geom_line(aes(y = fundraised, color = Party)) +
  scale_x_date(limits = as.Date(c("2022-01-01", "2022-10-19"))) +
  labs(title = "Fig. 5 Sum of Individual Contributions by Day for Federal Candidates \nof Each Party Running in Competitive Districts", 
       x = "Day of Contribution", 
       y = "Contribution Amount ($)") +
  scale_color_manual(values = c("GOP" = "red", "DEM" = "blue")) +
  geom_vline(xintercept=c(as.Date("2022-05-02"), as.Date("2022-06-24"), as.Date("2022-08-08")), linetype = 2, color = "black", size = 0.1) +
  geom_text(aes(as.Date("2022-05-02"), 7400000, label = "Dobbs Leak"), size= 3) +
  geom_text(aes(as.Date("2022-06-24"), 7000000, label = "Dobbs Decision"), size= 3) +
  geom_text(aes(as.Date("2022-08-08"), 6400000, label = "Trump FBI Raid"), size= 3) +
  theme_classic() +
  theme(panel.grid.major.y = element_line(color = "grey", size = 0.1))
## Warning: Removed 14 rows containing missing values (`geom_line()`).

Fig. 5 shows the contributions to candidates in competitive districts by party and day between 1/1/22 - 10/19/22. Competitive districts were determined based on the difference in vote share in the 2020 presidential election for each party, excluding any districts that were not within 10% points. Especially as the election came closer, the size of the fundraising gap in competitive districts between parties is more distinct. Democrats held the advantage in daily fundraising from the end of May forward. However, due to the difference in the way the Democrats and Republicans fundraise, the discrepancy can be explained by Republicans relying more on outside spending. More analysis and data on PAC contributions is needed to understand this dynamic.

Fig. 6.

# look at only competitive districts
# competitive districts defined as districts with + or - 10 ppt difference in 2020 presidential vote share (DEM - GOP)
# by week
final_receipt_date_party_week_competitive <- final_receipt_date %>%
                                              filter(abs(`Dem-Rep.2020`) <= .1) %>%
                                              mutate(week = week(as.Date(contribution_receipt_date))) %>%
                                              group_by(Party, week) %>%
                                              summarise(fundraised = sum(as.double(contribution_receipt_amount)))
## `summarise()` has grouped output by 'Party'. You can override using the
## `.groups` argument.
ggplot(final_receipt_date_party_week_competitive, aes(x = as.Date(paste(2022, week, 1, sep="-"), "%Y-%U-%u")), y = fundraised, color = Party) +
  geom_line(aes(y = fundraised, color = Party)) +
  scale_x_date(limits = as.Date(c("2022-01-01", "2022-10-19"))) +
  labs(title = "Fig. 6 Sum of Individual Contributions by Week for Federal Candidates of \nEach Party Running in Competitive Districts", 
       x = "Week of Contribution", 
       y = "Contribution Amount ($)") +
  scale_color_manual(values = c("GOP" = "red", "DEM" = "blue")) +
  geom_vline(xintercept=c(as.Date("2022-05-02"), as.Date("2022-06-24"), as.Date("2022-08-08")), linetype = 2, color = "black", size = 0.1) +
  geom_text(aes(as.Date("2022-05-02"), 30000000, label = "Dobbs Leak"), size= 3) +
  geom_text(aes(as.Date("2022-06-24"), 28000000, label = "Dobbs Decision"), size= 3) +
  geom_text(aes(as.Date("2022-08-08"), 26000000, label = "Trump FBI Raid"), size= 3) +
  theme_classic() +
  theme(panel.grid.major.y = element_line(color = "grey", size = 0.1))
## Warning: Removed 7 rows containing missing values (`geom_line()`).

Fig. 6 shows the contributions to candidates in competitive districts by party and week between 1/1/22 - 10/19/22. When broken down to just competitive districts, the 2nd and 3rd EOQ spikes show a more distinctive gap between Dem and GOP fundraising - Democrats dominated individual contributions in these districts.

Fig. 7.

# show the top fifteen candidates who fundraised the most for each party at each level of office
# House Democrats

final_top_house_dems <- final_house %>%
                          filter(Party == "DEM") %>%
                          top_n(15, total_fundraised_sum)

ggplot(final_top_house_dems, aes(x = total_fundraised_sum, y = reorder(Candidate, total_fundraised_sum), fill = total_fundraised_sum)) +
  geom_bar(stat="identity") +
  geom_text(aes(label=round(total_fundraised_sum / 1000000, 1)), hjust=-0.25) +
  scale_fill_gradient(low="lightblue",high="blue") +
  labs(title = "Fig. 7 Top Fundraisers Among House Democratic Candidates", 
       x = "Contribution Amount ($)", 
       y = "Candidate Name") +
  guides(fill=guide_legend(title="Contribution Amount")) +
  scale_x_continuous(labels=scales::comma, limits = c(0, 7000000)) +
  theme_classic()

Fig. 7 shows the 15 highest Democratic fundraisers running for the House in the 2022 Midterms. Most of the candidates were in competitive districts - which explains the wider gap in fundraising by party in these districts in Fig 5. Katie Porter (CA-45), the highest fundraiser on this visual, is one of the most well known members of the Democratic caucus. She has a high national presence and a strong social media following - she also was in a competitive race with a pro-life Republican candidate (https://www.latimes.com/politics/story/2022-10-20/2022-california-midterm-election-porter-baugh-abortion-economy-environment). 4 out of the 15 candidates shown narrowly lost their races.

Fig. 8.

# show the top fifteen candidates who fundraised the most for each party at each level of office
# House Republicans

final_top_house_gop <- final_house %>%
                          filter(Party == "GOP") %>%
                          top_n(15, total_fundraised_sum)

ggplot(final_top_house_gop, aes(x = total_fundraised_sum, y = reorder(Candidate, total_fundraised_sum), fill = total_fundraised_sum)) +
  geom_bar(stat="identity") +
  geom_text(aes(label=round(total_fundraised_sum / 1000000, 1)), hjust=-0.25) +
  scale_fill_gradient(low="lightcoral",high="red") +
  labs(title = "Fig. 8 Top Fundraisers Among House Republican Candidates", 
       x = "Contribution Amount ($)", 
       y = "Candidate Name") +
  guides(fill=guide_legend(title="Contribution Amount")) +
  scale_x_continuous(labels=scales::comma, limits = c(0, 9500000)) +
  theme_classic()

Fig. 8 shows the 15 highest Republican fundraisers running for the House in the 2022 Midterm Election. Most of the candidates shown maintain some sort of presence in the GOP mainstream - Kevin McCarthy, the minority leader, is most likely going to be speaker of the House in January, Elise Stefanik is a rising star in GOP ranks maneuvering from a moderate freshman Republican to ousting Liz Cheny for 3rd in GOP House Leadership, and Jim Jordan one of the most outspoken Trump loyalists. Harriet Hageman, the Trump Loyalist GOP challenger who defeated Cheney in her primary in Wyoming was the 5th highest fundraiser - despite never holding federal office. Interestingly, Republican candidates of color (James, Kim, Ciscomani, Steel…) were all leaders in GOP house candidate fundraising.

Fig. 9.

# show the top fifteen candidates who fundraised the most for each party at each level of office
# Senate Democrats

final_top_senate_dems <- final_senate %>%
                          filter(Party == "DEM") %>%
                          top_n(15, total_fundraised_sum)

ggplot(final_top_senate_dems, aes(x = total_fundraised_sum, y = reorder(Candidate, total_fundraised_sum), fill = total_fundraised_sum)) +
  geom_bar(stat="identity") +
  geom_text(aes(label=round(total_fundraised_sum / 1000000, 1)), hjust=-0.25) +
  scale_fill_gradient(low="lightblue",high="blue") +
  labs(title = "Fig. 9 Top Fundraisers Among Senate Democratic Candidates", 
       x = "Contribution Amount ($)", 
       y = "Candidate Name") +
  guides(fill=guide_legend(title="Contribution Amount")) +
  scale_x_continuous(labels=scales::comma, limits = c(0, 42000000)) +
  theme_classic()

Fig. 9 shows the 15 highest Democratic fundraisers running for Senate in the 2022 Midterm Election. This graph shows almost exclusively senate seats that were at least slightly competitive. 5 out of the 15 candidates lost their election - most notably the 3rd highest fundraiser Val Demmings losing by 16 points to Marco Rubio in Florida. Meanwhile Mandala Barnes, the Democratic challenger in Wisconsin, lost by just 1 point, despite his opponent Ron Johnson being the highest Republican fundraiser this cycle (Fig 10). Each senate election by state has different factors that influence not only how much money comes in, but how much it affects the final vote count.

Fig. 10.

# show the top fifteen candidates who fundraised the most for each party at each level of office
# Senate Republicans

final_top_senate_gop <- final_senate %>%
                          filter(Party == "GOP") %>%
                          top_n(15, total_fundraised_sum)

ggplot(final_top_senate_gop, aes(x = total_fundraised_sum, y = reorder(Candidate, total_fundraised_sum), fill = total_fundraised_sum)) +
  geom_bar(stat="identity") +
  geom_text(aes(label=round(total_fundraised_sum / 1000000, 1)), hjust=-0.25) +
  scale_fill_gradient(low="lightcoral",high="red") +
  labs(title = "Fig. 10 Top Fundraisers Among Senate Republican Candidates", 
       x = "Contribution Amount ($)", 
       y = "Candidate Name") +
  guides(fill=guide_legend(title="Contribution Amount")) +
  scale_x_continuous(labels=scales::comma, limits = c(0, 18000000)) +
  theme_classic()

Fig. 10 shows the 15 highest Republican fundraisers running for Senate in the 2022 Midterm Election. The top Republican fundraiser Ron Johnson raised less than half then the top 2 Democratic fundraisers, Kelly and Warnock - as Republicans rely more on Super Pacs than individual contributions. Johnson raised about $1.1m less than his opponent Mandela Barnes. It’s also interesting how large the fundraising discrepancy is between the candidates in extremely competitive districts. Walker, Oz, Vance, and Masters were all in battleground states from 2020 so we would expect their fundraising would be stronger. Vance, who raised the least of those 4 candidates, was the only one to win his Senate bid - despite Tim Ryan outraising him 3 to 1. It’s also interesting that two of the highest fundraisers, Murkowski and Tshibaka, were actually opponents in the Alaska Senate election - Murkowski, the incumbent who raised about 20-25% more, won.

Fig. 11.

# look at where contributions originated and went to by state and by party

final_receipt_date_contributor_state <- final_receipt_date %>%
                                          group_by(Party, contributor_state) %>%
                                          summarise(`Origin Contribution Amount` = sum(as.double(contribution_receipt_amount)))
## `summarise()` has grouped output by 'Party'. You can override using the
## `.groups` argument.
final_receipt_date_candidate_state <- final_receipt_date %>%
                                          group_by(Party, State, State_processed) %>%
                                          summarise(`Destination Contribution Amount` = sum(as.double(contribution_receipt_amount)))
## `summarise()` has grouped output by 'Party', 'State'. You can override using
## the `.groups` argument.
# For Democratic candidates
joined_state_dem <- left_join(final_receipt_date_contributor_state, final_receipt_date_candidate_state, by = c("contributor_state" = "State_processed", "Party" = "Party")) %>%
                  filter(!is.na(State), Party == "DEM") %>%
                  top_n(10, `Destination Contribution Amount`)

joined_state_dem_m <- melt(joined_state_dem[,c('State',"Destination Contribution Amount","Origin Contribution Amount")],id.vars = 1)

ggplot(joined_state_dem_m, aes(x = value, y = reorder(State, value))) +
  geom_bar(aes(fill = variable), stat="identity", position = "dodge") +
  labs(title = "Fig. 11 Top 10 States by Fundraising Amount for Democratic Candidates", 
       x = "Contribution Amount ($)", 
       y = "State") +
  scale_x_continuous(labels=scales::comma) +
  guides(fill=guide_legend(title="")) +
  theme_classic()

Fig. 11 shows the top 10 states that donated to Democratic candidates this cycle with the amount contributed by donors within the state in cyan and the amount raised by candidates running in the state in orange. California was by far the state with the most amount fundraised by donors living in the state, much in part due to the high population of the state as well as Democrats being viewed more favorably there. California also had relatively high fundraising numbers for candidates running in the state, but can most likely be attributed to the high number of house races in the state. States like Arizona, Georgia, Pennsylvania, Ohio, Nevada, and Wisconsin had relatively high fundraising amounts compared to the amount of donations that originated there, which makes sense given that these are battleground states with highly contested races.

Fig 12.

# For Republican candidates
joined_state_dem <- left_join(final_receipt_date_contributor_state, final_receipt_date_candidate_state, by = c("contributor_state" = "State_processed", "Party" = "Party")) %>%
                  filter(!is.na(State), Party == "GOP") %>%
                  top_n(10, `Destination Contribution Amount`)

joined_state_dem_m <- melt(joined_state_dem[,c('State',"Destination Contribution Amount","Origin Contribution Amount")],id.vars = 1)

ggplot(joined_state_dem_m, aes(x = value, y = reorder(State, value))) +
  geom_bar(aes(fill = variable), stat="identity", position = "dodge") +
  labs(title = "Fig. 12 Top 10 States by Fundraising Amount for Republican Candidates", 
       x = "Contribution Amount ($)", 
       y = "State") +
  scale_x_continuous(labels=scales::comma) +
  guides(fill=guide_legend(title="")) +
  theme_classic()

Fig. 12’s visual shows the top 10 states that donated to Republican candidates this cycle with the amount contributed by donors within the state in cyan, and the amount raised by candidates running in the state in orange. Republicans had high origin contribution amounts from populous states like Florida, California, and Texas. Similar to Democrats, Arizona, Georgia, Pennsylvania, Wisconsin, and Ohio saw a higher amount fundraised by candidates running in those states than fundraised from contributors living in those states.

Analysis

This section presents the main results, such as (for example) stats and graphs that show relationships, model results and/or clustering, PCA, etc.

# build linear model for each office level for fundraising
# house of representatives
house_model <- glm(total_fundraised_sum ~ . - Candidate - State - State_processed - District - Party, data = final_house, family="gaussian")
summary(house_model)
## 
## Call:
## glm(formula = total_fundraised_sum ~ . - Candidate - State - 
##     State_processed - District - Party, family = "gaussian", 
##     data = final_house)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1278536   -339166   -179777    107333   8296658  
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            80798     216706   0.373   0.7094    
## GOP.Num                52628      70222   0.749   0.4538    
## Gender.Num            -25375      65429  -0.388   0.6983    
## White.Num              96625     165687   0.583   0.5599    
## Black.Num             -80642     169146  -0.477   0.6337    
## Asian.Num             300559     207802   1.446   0.1485    
## Latino.Num             14499     174396   0.083   0.9338    
## Middle_Eastern.Num   -124492     383699  -0.324   0.7457    
## Native_American.Num   541116     282529   1.915   0.0558 .  
## Incumbent.Num         -85585      95162  -0.899   0.3687    
## Trump.Num              20790     103166   0.202   0.8403    
## Party.Committee.Num   866306     122985   7.044 4.09e-12 ***
## Emily.s.List.Num      828938     132180   6.271 5.91e-10 ***
## Maggie.s.List.Num     214868     141303   1.521   0.1288    
## Sanders.Num           -95652     214979  -0.445   0.6565    
## Renew.America.Num    -292526     716622  -0.408   0.6832    
## Winner.Num            458377     103335   4.436 1.05e-05 ***
## `2020.White.%`        -11895     173154  -0.069   0.9452    
## Pres_2020              24487     586056   0.042   0.9667    
## Comp                  237344     504165   0.471   0.6379    
## `Dem-Rep.2020`        786932     521116   1.510   0.1314    
## `Dem-Rep.Comp`       -844300     533138  -1.584   0.1137    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 505639653427)
## 
##     Null deviance: 5.2962e+14  on 805  degrees of freedom
## Residual deviance: 3.9642e+14  on 784  degrees of freedom
## AIC: 24032
## 
## Number of Fisher Scoring iterations: 2

In this model looking at House candidates, Party Committee and Emily’s List endorsements as well as win status were found to be statistically significant at 95% confidence in increasing fundraising for the candidate. Native American status is statistically significant at 90% confidence.

# plot basic visuals of the model
plot(house_model)
## Warning: not plotting observations with leverage one:
##   174

# build linear model for each office level for fundraising
# senate model
senate_model <- glm(total_fundraised_sum ~ . - Candidate - State_processed - State - Party, data = final_senate, family="gaussian")
summary(senate_model)
## 
## Call:
## glm(formula = total_fundraised_sum ~ . - Candidate - State_processed - 
##     State - Party, family = "gaussian", data = final_senate)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -9866544  -4513949   -839941   2099391  24805139  
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)  
## (Intercept)          13604342   12534971   1.085   0.2834  
## GOP.Num               1298602    4011008   0.324   0.7476  
## Gender.Num            4267648    3285674   1.299   0.2005  
## White.Num             1019774    6776886   0.150   0.8810  
## Black.Num             6570202    7724269   0.851   0.3994  
## Asian.Num           -11733104    9244647  -1.269   0.2108  
## Latino.Num            1635883    7041803   0.232   0.8173  
## Middle_Eastern.Num   11049531   11244542   0.983   0.3309  
## Native_American.Num  -1160448    6688013  -0.174   0.8630  
## Incumbent.Num         2355866    5673584   0.415   0.6799  
## Trump.Num             3450534    3674620   0.939   0.3526  
## Party.Committee.Num  -5023172    6516587  -0.771   0.4447  
## Emily.s.List.Num     11687898    5255787   2.224   0.0311 *
## Maggie.s.List.Num     4747322    7491348   0.634   0.5294  
## Sanders.Num          13433738    9135195   1.471   0.1482  
## Renew.America.Num     3601442    9469739   0.380   0.7055  
## Winner.Num            7078693    4876333   1.452   0.1534  
## `2020.White.%`       -9565337   10368512  -0.923   0.3611  
## Pres_2020           -17303592   47761403  -0.362   0.7188  
## Comp                 -9512435   47423748  -0.201   0.8419  
## `Dem-Rep.2020`       15356441   26154783   0.587   0.5600  
## `Dem-Rep.Comp`      -22094526   28462190  -0.776   0.4416  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 6.572411e+13)
## 
##     Null deviance: 4.8543e+15  on 67  degrees of freedom
## Residual deviance: 3.0233e+15  on 46  degrees of freedom
## AIC: 2375.9
## 
## Number of Fisher Scoring iterations: 2

This model shows that among Senate candidates, being endorsed by Emily’s List was found to be the only statistically significant influence on fundraising, to the 95% confidence level.

# plot basic visuals of the model
plot(senate_model)
## Warning: not plotting observations with leverage one:
##   37, 40, 46, 59

# build linear model for each office level for fundraising
# competitive districts
competitive_model <- glm(total_fundraised_sum ~ . - Candidate - State - State_processed - Party, data = final_competitive, family="gaussian")
summary(competitive_model)
## 
## Call:
## glm(formula = total_fundraised_sum ~ . - Candidate - State - 
##     State_processed - Party, family = "gaussian", data = final_competitive)
## 
## Deviance Residuals: 
##       Min         1Q     Median         3Q        Max  
## -12268489    -848881      76462    1035696   19165700  
## 
## Coefficients: (1 not defined because of singularities)
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          3322511    6120918   0.543   0.5880    
## GOP.Num              1101457     772087   1.427   0.1557    
## Gender.Num           -218150     810514  -0.269   0.7882    
## White.Num             168242    1403072   0.120   0.9047    
## Black.Num            1986476    1576572   1.260   0.2096    
## Asian.Num             429797    2262050   0.190   0.8496    
## Latino.Num           -361048    1424548  -0.253   0.8003    
## Middle_Eastern.Num  -1150072    3879218  -0.296   0.7673    
## Native_American.Num  -832333    2619948  -0.318   0.7512    
## Incumbent.Num        1395526     756064   1.846   0.0668 .  
## Senator.Num         14382065     898347  16.009   <2e-16 ***
## Trump.Num           -1116557     917100  -1.217   0.2253    
## Party.Committee.Num   644896     744400   0.866   0.3877    
## Emily.s.List.Num      -44995    1136001  -0.040   0.9685    
## Maggie.s.List.Num     147109    1277721   0.115   0.9085    
## Sanders.Num          1679869    3620851   0.464   0.6433    
## Renew.America.Num         NA         NA      NA       NA    
## Winner.Num           1017024     770635   1.320   0.1889    
## `2020.White.%`      -2110329    1945772  -1.085   0.2798    
## Pres_2020            3207249   15138129   0.212   0.8325    
## Comp                -8106584   10779905  -0.752   0.4532    
## `Dem-Rep.2020`      -1290337    7105548  -0.182   0.8561    
## `Dem-Rep.Comp`       2403656    6137680   0.392   0.6959    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1.147759e+13)
## 
##     Null deviance: 5.7186e+15  on 175  degrees of freedom
## Residual deviance: 1.7675e+15  on 154  degrees of freedom
## AIC: 5814.5
## 
## Number of Fisher Scoring iterations: 2

Among competitive districts, only Senator status was found to be statistically significant with incumbency status having a p-value lower than 0.1, but not low enough to be significant at 95% confidence.

# plot basic visuals of the model
plot(competitive_model)
## Warning: not plotting observations with leverage one:
##   111, 125

# build logistic model for probability of winning
# house of representatives
house_model2 <- glm(Winner.Num ~ . - Candidate - State - State_processed - District - Party, data = final_house, family="binomial")
summary(house_model2)
## 
## Call:
## glm(formula = Winner.Num ~ . - Candidate - State - State_processed - 
##     District - Party, family = "binomial", data = final_house)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -3.06874  -0.10642   0.00002   0.13882   2.33378  
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -1.608e+01  2.512e+00  -6.402 1.54e-10 ***
## GOP.Num              -2.235e+00  5.522e-01  -4.048 5.16e-05 ***
## Gender.Num            4.720e-01  5.037e-01   0.937  0.34865    
## White.Num            -1.045e+00  1.269e+00  -0.824  0.41016    
## Black.Num            -9.752e-01  1.365e+00  -0.715  0.47484    
## Asian.Num            -1.647e+00  1.517e+00  -1.085  0.27782    
## Latino.Num            4.986e-01  1.288e+00   0.387  0.69878    
## Middle_Eastern.Num    4.742e+00  1.175e+02   0.040  0.96780    
## Native_American.Num  -6.738e-01  1.745e+00  -0.386  0.69943    
## Incumbent.Num         3.908e+00  5.004e-01   7.810 5.74e-15 ***
## Trump.Num            -1.232e+00  7.787e-01  -1.582  0.11361    
## Party.Committee.Num   4.212e-01  5.196e-01   0.811  0.41762    
## Emily.s.List.Num      5.215e-01  7.338e-01   0.711  0.47727    
## Maggie.s.List.Num    -4.866e-01  9.277e-01  -0.525  0.59992    
## Sanders.Num           1.559e+01  8.283e+02   0.019  0.98498    
## Renew.America.Num     1.115e+01  3.956e+03   0.003  0.99775    
## `2020.White.%`        1.951e+00  1.272e+00   1.534  0.12499    
## Pres_2020             3.620e+01  6.687e+00   5.413 6.19e-08 ***
## Comp                 -6.388e+00  4.735e+00  -1.349  0.17734    
## `Dem-Rep.2020`       -6.615e+00  4.545e+00  -1.455  0.14556    
## `Dem-Rep.Comp`        7.241e-01  4.130e+00   0.175  0.86083    
## total_fundraised_sum  8.901e-07  2.873e-07   3.098  0.00195 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1117.33  on 805  degrees of freedom
## Residual deviance:  213.01  on 784  degrees of freedom
## AIC: 257.01
## 
## Number of Fisher Scoring iterations: 16

This model shows what characteristics affect House candidate’s chances of winning. GOP candidates, incumbents, total fundraised and Pres. 2020 variables were found to be statistically significant at 95% confidence.

# plot basic visuals of the models
plot(house_model2)
## Warning: not plotting observations with leverage one:
##   174

# build logistic model for probability of winning
# senate
senate_model2 <- glm(Winner.Num ~ . - Candidate - State - State_processed - Party, data = final_senate, family="binomial")
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(senate_model2)
## 
## Call:
## glm(formula = Winner.Num ~ . - Candidate - State - State_processed - 
##     Party, family = "binomial", data = final_senate)
## 
## Deviance Residuals: 
##        Min          1Q      Median          3Q         Max  
## -2.069e-05  -2.110e-08  -2.110e-08   2.110e-08   2.854e-05  
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)
## (Intercept)          -4.581e+02  4.746e+07   0.000        1
## GOP.Num              -2.207e+01  4.642e+05   0.000        1
## Gender.Num           -1.592e+01  3.712e+05   0.000        1
## White.Num             3.628e+01  4.746e+07   0.000        1
## Black.Num             4.851e+01  4.746e+07   0.000        1
## Asian.Num            -4.403e+01  5.376e+05   0.000        1
## Latino.Num            2.596e+01  4.746e+07   0.000        1
## Middle_Eastern.Num   -8.345e+01  4.746e+07   0.000        1
## Native_American.Num  -8.042e+01  8.184e+05   0.000        1
## Incumbent.Num         1.095e+02  4.622e+05   0.000        1
## Trump.Num             5.701e+01  5.792e+05   0.000        1
## Party.Committee.Num  -7.959e+01  8.014e+05   0.000        1
## Emily.s.List.Num     -3.516e+01  5.754e+05   0.000        1
## Maggie.s.List.Num    -6.658e+01  4.746e+07   0.000        1
## Sanders.Num          -4.922e+01  4.541e+05   0.000        1
## Renew.America.Num     3.361e+01  5.781e+05   0.000        1
## `2020.White.%`        2.972e+02  8.125e+05   0.000        1
## Pres_2020             4.701e+02  3.944e+06   0.000        1
## Comp                 -3.306e+01  2.317e+06   0.000        1
## `Dem-Rep.2020`       -6.909e+02  1.990e+06   0.000        1
## `Dem-Rep.Comp`        8.288e+02  1.626e+06   0.001        1
## total_fundraised_sum  7.587e-07  1.150e-02   0.000        1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 9.4033e+01  on 67  degrees of freedom
## Residual deviance: 2.9417e-09  on 46  degrees of freedom
## AIC: 44
## 
## Number of Fisher Scoring iterations: 25

This model among Senate candidates shows no variables were statistically significant in predicting winning outcome. This is most likely due in part to low N observations.

# plot basic visuals of the models
plot(senate_model2)
## Warning: not plotting observations with leverage one:
##   37, 40, 46, 59

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

# build logistic model for probability of winning
# competitive
competitive_model2 <- glm(Winner.Num ~ . - Candidate - State - State_processed - Party, data = final_competitive, family="binomial")
summary(competitive_model2)
## 
## Call:
## glm(formula = Winner.Num ~ . - Candidate - State - State_processed - 
##     Party, family = "binomial", data = final_competitive)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.25859  -0.38923   0.00766   0.34084   2.13679  
## 
## Coefficients: (1 not defined because of singularities)
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -4.026e+01  7.622e+00  -5.283 1.27e-07 ***
## GOP.Num              -2.606e+00  8.254e-01  -3.157  0.00159 ** 
## Gender.Num            8.466e-01  8.334e-01   1.016  0.30968    
## White.Num             1.777e+00  1.434e+00   1.239  0.21520    
## Black.Num             2.530e+00  1.653e+00   1.530  0.12590    
## Asian.Num             2.546e+00  2.256e+00   1.128  0.25912    
## Latino.Num            4.716e+00  1.588e+00   2.970  0.00298 ** 
## Middle_Eastern.Num   -1.463e+01  2.400e+03  -0.006  0.99513    
## Native_American.Num  -5.690e-01  1.776e+00  -0.320  0.74872    
## Incumbent.Num         3.769e+00  8.607e-01   4.379 1.19e-05 ***
## Senator.Num          -5.456e-02  1.584e+00  -0.034  0.97253    
## Trump.Num            -1.170e+00  9.008e-01  -1.299  0.19408    
## Party.Committee.Num  -2.227e-01  6.703e-01  -0.332  0.73965    
## Emily.s.List.Num      5.765e-01  1.102e+00   0.523  0.60093    
## Maggie.s.List.Num     1.120e+00  1.276e+00   0.878  0.38015    
## Sanders.Num          -1.687e+01  2.400e+03  -0.007  0.99439    
## Renew.America.Num            NA         NA      NA       NA    
## `2020.White.%`        4.502e+00  2.047e+00   2.199  0.02787 *  
## Pres_2020             6.032e+01  1.543e+01   3.909 9.26e-05 ***
## Comp                  1.052e+01  1.174e+01   0.896  0.37026    
## `Dem-Rep.2020`       -5.359e+00  7.376e+00  -0.727  0.46750    
## `Dem-Rep.Comp`        7.180e+00  6.257e+00   1.147  0.25120    
## total_fundraised_sum  6.626e-08  9.857e-08   0.672  0.50143    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 243.99  on 175  degrees of freedom
## Residual deviance: 105.39  on 154  degrees of freedom
## AIC: 149.39
## 
## Number of Fisher Scoring iterations: 15

Among competitive districts, incumbency status, presidential vote share in 2020, GOP status, Latino status, and percentage white in district, were found to be statistically significant at 95% confidence in predicting win.

# plot basic visuals of the model
plot(competitive_model2)
## Warning: not plotting observations with leverage one:
##   111, 125

# conduct principal component analysis for each data set

# for house
final_house_pca <-
  as.data.frame(final_house) %>%
  select(-Candidate, -State, -State_processed, -Party, -District) %T>%
  pairs(.)

pca_house <- prcomp(x = final_house_pca, 
                  scale. = TRUE)
summary(pca_house)
## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5    PC6     PC7
## Standard deviation     1.9014 1.8470 1.3618 1.24252 1.17086 1.1010 1.04295
## Proportion of Variance 0.1643 0.1551 0.0843 0.07018 0.06231 0.0551 0.04944
## Cumulative Proportion  0.1643 0.3194 0.4037 0.47387 0.53618 0.5913 0.64072
##                            PC8     PC9    PC10    PC11    PC12    PC13    PC14
## Standard deviation     1.02695 1.00068 0.99523 0.97272 0.91376 0.87600 0.77513
## Proportion of Variance 0.04794 0.04552 0.04502 0.04301 0.03795 0.03488 0.02731
## Cumulative Proportion  0.68866 0.73418 0.77920 0.82221 0.86016 0.89504 0.92235
##                          PC15    PC16   PC17    PC18    PC19    PC20    PC21
## Standard deviation     0.6667 0.60989 0.5801 0.53518 0.39160 0.23269 0.22284
## Proportion of Variance 0.0202 0.01691 0.0153 0.01302 0.00697 0.00246 0.00226
## Cumulative Proportion  0.9425 0.95946 0.9748 0.98778 0.99475 0.99721 0.99947
##                           PC22
## Standard deviation     0.10845
## Proportion of Variance 0.00053
## Cumulative Proportion  1.00000
fviz_screeplot(pca_house)

All 10 dimensions explain portion of variance around 5% or greater with dimensions 9 and 10 being slightly lower.

# for senate
final_senate_pca <-
  as.data.frame(final_senate) %>%
  select(-Candidate, -State, -State_processed, -Party) %T>%
  pairs(.)

pca_senate <- prcomp(x = final_senate_pca, 
                  scale. = TRUE)
summary(pca_senate)
## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6    PC7
## Standard deviation     2.0515 1.7211 1.47590 1.40218 1.24354 1.13690 1.0634
## Proportion of Variance 0.1913 0.1346 0.09901 0.08937 0.07029 0.05875 0.0514
## Cumulative Proportion  0.1913 0.3259 0.42495 0.51432 0.58461 0.64336 0.6948
##                            PC8     PC9    PC10    PC11    PC12    PC13    PC14
## Standard deviation     1.05395 0.99130 0.95454 0.92009 0.85839 0.75357 0.65405
## Proportion of Variance 0.05049 0.04467 0.04142 0.03848 0.03349 0.02581 0.01944
## Cumulative Proportion  0.74526 0.78993 0.83134 0.86982 0.90332 0.92913 0.94857
##                           PC15    PC16    PC17    PC18    PC19    PC20   PC21
## Standard deviation     0.61424 0.50633 0.45342 0.35165 0.27780 0.23145 0.1482
## Proportion of Variance 0.01715 0.01165 0.00935 0.00562 0.00351 0.00243 0.0010
## Cumulative Proportion  0.96572 0.97738 0.98672 0.99234 0.99585 0.99828 0.9993
##                           PC22
## Standard deviation     0.12572
## Proportion of Variance 0.00072
## Cumulative Proportion  1.00000
fviz_screeplot(pca_senate)

All 10 dimensions explain portion of variance around 5% or greater with dimensions 9 and 10 being slightly lower.

# for competitive districts
final_competitive_pca <-
  as.data.frame(final_competitive) %>%
  select(-Candidate, -State, -State_processed, -Party, -Renew.America.Num) %T>%
  pairs(.)

pca_competitive <- prcomp(x = final_competitive_pca, 
                  scale. = TRUE)
summary(pca_competitive)
## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5    PC6     PC7
## Standard deviation     1.7436 1.5769 1.5038 1.37800 1.28231 1.2113 1.10870
## Proportion of Variance 0.1382 0.1130 0.1028 0.08631 0.07474 0.0667 0.05587
## Cumulative Proportion  0.1382 0.2512 0.3540 0.44032 0.51506 0.5818 0.63763
##                            PC8     PC9    PC10    PC11    PC12    PC13    PC14
## Standard deviation     1.07136 1.00831 1.00065 0.92541 0.89189 0.80740 0.75311
## Proportion of Variance 0.05217 0.04621 0.04551 0.03893 0.03616 0.02963 0.02578
## Cumulative Proportion  0.68980 0.73601 0.78153 0.82045 0.85661 0.88624 0.91203
##                           PC15    PC16    PC17    PC18    PC19    PC20    PC21
## Standard deviation     0.69393 0.57392 0.55322 0.47556 0.44225 0.40491 0.39307
## Proportion of Variance 0.02189 0.01497 0.01391 0.01028 0.00889 0.00745 0.00702
## Cumulative Proportion  0.93391 0.94889 0.96280 0.97308 0.98197 0.98942 0.99644
##                           PC22
## Standard deviation     0.27977
## Proportion of Variance 0.00356
## Cumulative Proportion  1.00000
fviz_screeplot(pca_competitive)

All 10 dimensions explain portion of variance around 5% or greater with dimensions 9 and 10 being slightly lower.

Discussion

This section summarizes the results and may briefly outline advantages and limitations of the work presented.

Between the two parties, Democrats had a better fundraising performance from individual contributions in the 2022 Midterm Election. We saw a distinct gap between fundraising trends after abortion became a more salient political issue. Before the Dobbs leak on May 2nd, 2022, both parties were fundraising at about the same rate. For Democrats, while the Dobbs decision was a blow to their greater agenda, it was a watershed moment for their fundraising. Presidential approval rating was low, the global markets had been turbulent, and inflation was at record highs - enthusiasm for fundraising was low. However, Dobbs gave Democrats a unifying rallying call in their messaging that changed the trajectory of their fundraising - and in turn their chances to win.

A plurality of Democratic candidates ran on the message of Choice, and as was seen with the Kansas abortion referendum and Democratic over performance in key special elections in Alaska, Wisconsin and New York. In our results, an endorsement from Emily’s List, a pro-choice woman organization, significantly improved fundraising numbers for those candidates. High fundraising is indicative of a greater chance of winning, so every year candidate’s are striving to raise more than ever. Based on these trends, Democrats’ over performance in fundraising in the months leading up to the elections, also indicated the potential of over performing election expectations - despite the fundamentals being against them (Pres. Approval, Economy, and Party out-of-power midterm advantage).

One of the advantages of our work is being able to identify key candidate characteristics for the primaries, such as endorsements, fundraising numbers, district characteristics, and candidate demographics. It also allows us to look at both fundraising expectations and expectations of winning the election. Some of the disadvantages we found were that the FEC data did not include the final weeks of the election cycle. Also, the FEC only has reporting for individuals contributing over $200 and not all PAC money is included. The FEC API was difficult to incorporate into our analysis given its complications in processing the relevant information we needed. Even the FEC website interface itself was difficult to manage. If the API had worked, it would have streamlined our data collection process and made it easier to analyze updated FEC data.

Given the fact that the Dobbs decision happened near the end of a fundraising quarter, another disadvantage is that it is harder to clearly distinguish the full “Dobbs effect” from the general EOQ fundraising trends. If we had more time we would have further removed outliers and optimized our models, making sure we’re watching for variance - and using the results of PCA to fine tune our model. Some further research we could do includes: checking the vote margins for the 2022 election compared to past cycles, updating our analysis once FEC reports are finalized (data past Oct 19th), research on fundraising language and methods used by each candidate to see the prevalence of keywords related to major political shocks and to see which methods were more effective. The 2022 midterm election cycle was unique in political history and more research on this election cycle may provide meaningful insight into future elections.

References

Open Secrets (2003), Election Overview: Cost of the Election. Washington, DC: Center for Responsive Politics. https://www.opensecrets.org/elections-overview/cost-of-election

Open Secrets (2003), Election Overview: Did Money Win?. Washington, DC: Center for Responsive Politics. https://www.opensecrets.org/elections-overview/winning-vs-spending

New York Times (2022), House Election Tracker https://www.nytimes.com/news-event/2022-midterm-elections

New York Times (2022), Senate Election Tracker https://www.nytimes.com/news-event/2022-midterm-elections

Politico (2022), Read Justice Alito’s initial draft abortion opinion which would overturn Roe v. Wade https://www.politico.com/news/2022/05/02/read-justice-alito-initial-abortion-opinion-overturn-roe-v-wade-pdf-00029504

Supreme Court Dobbs Decision https://www.supremecourt.gov/opinions/21pdf/19-1392_6j37.pdf

United States (1997). U.S. Federal Election Commission FEC https://www.fec.gov/data

Dave’s Redistricting App https://davesredistricting.org/maps#home

Scherer, M. (2022). Democrats sound alarms about funding in battle for House majority. The Washington Post. https://www.washingtonpost.com/politics/2022/10/07/house-democrats-fundraising/

Fry, H. (2022). Where do rep. Katie Porter and Scott Baugh stand on abortion, inflation, immigration? Los Angeles Times. https://www.latimes.com/politics/story/2022-10-20/2022-california-midterm-election-porter-baugh-abortion-economy-environment

FiveThiryEight (2022) Primary Project 2022 https://github.com/fivethirtyeight/data/tree/master/primary-project-2022